A Mining Redescriptions with Siren

نویسندگان

  • Esther Galbrun
  • Pauli Miettinen
چکیده

In many areas of science, scientists need to find distinct common characterizations of the same objects and, vice versa, to identify sets of objects that admit multiple shared descriptions. For example, in biology, an important task is to identify the bioclimatic constraints that allow some species to survive, that is, to describe geographical regions both in terms of the fauna that inhabits them and of their bioclimatic conditions. In data analysis, the task of automatically generating such alternative characterizations is called redescription mining. If a domain expert wants to use redescription mining in his research, merely being able to find redescriptions is not enough. He must also be able to understand the redescriptions found, adjust them to better match his domain knowledge, test alternative hypotheses with them, and guide the mining process towards results he considers interesting. To facilitate these goals, we introduce Siren, an interactive tool for mining and visualizing redescriptions. Siren allows to obtain redescriptions in an anytime fashion through efficient, distributed mining, to examine the results in various linked visualizations, to interact with the results either directly or via the visualizations, and to guide the mining algorithm toward specific redescriptions. In this paper, we explain the features of Siren and why they are useful for redescription mining. We also propose two novel redescription mining algorithms that improve the generalizability of the results compared to the existing ones. CCS Concepts: rInformation systems→Data mining; rHuman-centered computing→ Visualization systems and tools; Graphical user interfaces;

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Siren: An Interactive Tool for Mining and Visualizing Geospatial Redescriptions [Demo]

We present Siren, an interactive tool for mining and visualizing geospatial redescriptions. Redescription mining is a powerful data analysis tool that aims at finding alternative descriptions of the same entities. For example, in biology, an important task is to identify the bioclimatic constraints that allow some species to survive, that is, to describe geographical regions in terms of both th...

متن کامل

Redescription Mining: Algorithms and Applications in Bioinformatics

Scientific data mining purports to extract useful knowledge from massive datasets curated through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences, and computational chemistry. In the recent past, we have witnessed major transformations of these applied sciences into data-driven endeavors. In particular, scientists are now faced with an overload of vocabula...

متن کامل

Redescription Mining: Structure Theory and Algorithms

We introduce a new data mining problem—redescription mining—that unifies considerations of conceptual clustering, constructive induction, and logical formula discovery. Redescription mining begins with a collection of sets, views it as a propositional vocabulary, and identifies clusters of data that can be defined in at least two ways using this vocabulary. The primary contributions of this pap...

متن کامل

A framework for redescription set construction

Redescription mining is a field of knowledge discovery that aims at finding different descriptions of similar subsets of instances in the data. These instances are characterized with descriptive attributes from one or more disjoint sets of attributes called views. By exploring different characterizations it is possible to find non trivial and interesting connections between different subsets of...

متن کامل

A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining

We present a method for visual and interactive geospatial redescription mining. The goal of geospatial redescription mining is to characterize geospatial areas using two different descriptions, such as their bioclimatic features and fauna. Indeed, one application of geospatial redescription mining is finding bioclimatic niches, i.e. explaining the distribution of species using their bioclimatic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016